JIT: forward-sub cheap address temps with multiple uses#129312
Conversation
|
For the canonical repro: struct FooS { public int a, b; }
class StructContainer
{
FooS foo;
public void Update() { foo.a++; foo.b++; }
}x64 codegen (FullOpts), 12 bytes / 4 insns → 6 bytes / 2 insns: ; Before
lea rax, [rcx+0x08]
inc dword ptr [rax]
add rcx, 12
inc dword ptr [rcx]
; After
inc dword ptr [rcx+0x08]
inc dword ptr [rcx+0x0C]Note This comment was authored with the assistance of GitHub Copilot CLI. |
|
@dhartglassMSFT PTAL |
There was a problem hiding this comment.
Pull request overview
This PR extends the JIT’s forward-substitution phase to allow substituting certain cheap “address temp” expressions at multiple use sites (not just a single last-use), primarily to unblock downstream address-mode containment in Lowering. It also factors out a common GT_FIELD_ADDR-peeling loop into Compiler::gtPeelFieldAddrs and reuses it from a couple of other call sites.
Changes:
- Add
Compiler::gtPeelFieldAddrsand use it fromimpIsAddressInLocalandfgAddrCouldBeHeap. - Add a multi-use forward-substitution path for “cheap reorderable address trees”, implemented via
fgForwardSubMultiUse. - Update
compiler.hwith new helper/method declarations.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/promotiondecomposition.cpp | Adds Compiler::gtPeelFieldAddrs helper implementation. |
| src/coreclr/jit/importer.cpp | Replaces local FIELD_ADDR peeling with gtPeelFieldAddrs. |
| src/coreclr/jit/forwardsub.cpp | Adds cheap-address detection and multi-use forward-substitution logic. |
| src/coreclr/jit/flowgraph.cpp | Reuses gtPeelFieldAddrs in fgAddrCouldBeHeap. |
| src/coreclr/jit/compiler.h | Declares gtPeelFieldAddrs and fgForwardSubMultiUse. |
4819780 to
eba75bf
Compare
eba75bf to
b1a0c81
Compare
Substitute a FIELD_ADDR chain rooted at a local at every use site so the dup-spill temp for field updates no longer blocks address-mode containment in Lowering. Also refactor the FIELD_ADDR walk into gtPeelFieldAddrs and share it with fgAddrCouldBeHeap and impIsAddressInLocal. The multi-use path pre-allocates every clone before mutating the IR (so a gtCloneExpr failure bails cleanly), caps the use count at four to keep the targeted patterns in scope, and conservatively clears GTF_VAR_DEATH bits on every LCL_VAR in the inserted copies. The peel helper has a const overload so callers like impIsAddressInLocal stay const-safe. Fixes dotnet#67187. > [!NOTE] > This change was authored with the assistance of GitHub Copilot CLI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b1a0c81 to
71a0a06
Compare
JulieLeeMSFT
left a comment
There was a problem hiding this comment.
LGTM including removing all GTF_VAR_DEATH_MASK.
Per @JulieLeeMSFT: the bail message also fires on > MaxUses, not just on too few uses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Clarify fgIsCheapReorderableAddressTree Returns: doc to make the zero-hop (bare LCL_VAR/LCL_ADDR) case explicit. - Generalize the JITDUMP bail message in the multi-use call site; fgForwardSubMultiUse can fail for multiple reasons. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Helps arm64 more than I would have expected. |
|
/ba-g The job running on agent NetCore-Public 131 ran longer than the maximum time of 300 minutes |
Substitute a FIELD_ADDR chain rooted at a local at every use site so the dup-spill temp for field updates no longer blocks address-mode containment in Lowering. Also refactor the FIELD_ADDR walk into gtPeelFieldAddrs and share it with fgAddrCouldBeHeap and impIsAddressInLocal.
Fixes #67187.
Note
This change was authored with the assistance of GitHub Copilot CLI.